Instructions

Exercise

Melbourne property prices have taken their biggest hit since 2012, falling by almost 2 per cent in the past three months Jim Malo, Jul 26 2018, Domain

This assignment explores the data provided on Melbourne house prices by Anthony Pino. The goal is to examine whether housing prices have cooled in Melbourne, and help Anthony decide whether it is time to buy a two bedroom apartment in Northcote.

Your tasks

  1. Make a map of Melbourne showing the locations of the properties.

  1. Here we are going to examine the prices of 2 bedroom flats in Northcote.
    1. Filter the data to focus only on the records for Northcote units. Make a plot of Price by Date, facetted by number of bedrooms. The main thing to learn from this plot is that there are many missing values for number of bedrooms.
    2. Impute the missing values, based on the regression method (covered in class). Make sure your predicted value is an integer. Re-make the plot of Price by Date, facetted by number of bedrooms.
    3. Write a description of what you learn from the plot, particularly about the trend of 2 bedroom unit prices in Northcote.

  1. Focusing on 2 bedroom units, we are going to explore the trend in prices for each suburb.
    1. You will need to impute the Bedroom2 variable, in the same way done in the previous question.
    2. Fit a linear model to each suburb (many models approach). Collect the model estimates, and also the model fit statistics. Make a plot of intercept vs slope. Using plotly what suburb has had the largest increase, which has had the biggest decrease in prices?
    3. Summarise the \(R^2\) for the model fits for all the suburbs. Which suburbs have the worst fitting models? Plot the Price vs Date of the best fitting model. Is the best fitting model a good fit?
    4. Write a paragraph on what you have learned about the trend in property prices across Melbourne.

# A tibble: 6 x 3
  Suburb         intercept  days
  <chr>              <dbl> <dbl>
1 Armadale        1425861.  383.
2 Balwyn          1735707.  376.
3 Bentleigh       1206824.  212.
4 Bentleigh East   965684.  394.
5 Brighton        1719286.  620.
6 Brunswick        903895.  167.

  1. Still focusing on apartments (units) examine the results of the auctions, with the Method variable, across suburbs. This variable contains results of the auction, whether the property sold, or not. It may be that in recent months there is a higher proportion of properties that didn’t sell. This would put downward pressure on prices.
    1. Compute the counts of the levels of Method, ignoring the suburbs.
    2. The categories PI (passed in) and VB (vendor bid) indicate the property did not sell. Compute the proportion of properties in these two categories for each suburb, for each month since 2016.
    3. Plot the proportions against year/month (make a new variable time is an integer with 1 being the first month of the data in 2016 and each month since then increments time by 1). Add a smoother to show the trend in these proportions. Does it look like there is an increase in units that aren’t selling?
    4. Explain why the data was aggregated to month before computing the proportions.
# A tibble: 9 x 2
  Method     n
  <chr>  <int>
1 S       8177
2 SP      2024
3 PI      2018
4 VB      1612
5 SN       480
6 PN       105
7 SA        73
8 W         56
9 SS        12

  1. Fit the best model for Price that you can, for houses around Monash University.
    1. Impute the missing values for Bathroom (similarly to Bedroom2).
    2. Subset the data to these suburbs “Notting Hill”, “Glen Waverley”, “Clayton”, “Clayton South”,“Oakleigh East”, “Huntingdale”, “Mount Waverley”.
    3. Make a scatterplot of Price vs Date by Bedroom2 and Bathroom, with a linear model overlaid. What do you notice? There are only some combinations of bedrooms and bathrooms that are common. Subset your data to houses with 3-4 bedrooms and 1-2 bathrooms.
    4. Using date, rooms, bedroom, bathroom, car and landsize build your best model for price. There are some missing values on Car and Landsize, which may be important to impute. Think about interactions as well as main effects. (There are too many missing values to use BuildingArea and YearBuilt. The other variables in the data don’t make sense to use.)

         term   estimate   std.error  statistic      p.value
1 (Intercept) 358544.822 178028.0026  2.0139799 4.564639e-02
2         day   -354.513    255.6064 -1.3869491 1.673395e-01
3       Rooms  56338.832 180046.5525  0.3129126 7.547446e-01
4    Bedroom2 -49394.920 183036.8344 -0.2698633 7.876047e-01
5    Bathroom 115734.550  53789.1839  2.1516324 3.288958e-02
6         Car -96446.474  28457.1343 -3.3891843 8.780415e-04
7    Landsize   1414.025    146.4015  9.6585424 9.387290e-18
  r.squared adj.r.squared    sigma statistic      p.value df   logLik
1 0.4067999     0.3850975 295131.7  18.74443 1.509531e-16  7 -2392.84
      AIC      BIC     deviance df.residual
1 4801.68 4826.813 1.428484e+13         164
         term    estimate   std.error statistic      p.value
1 (Intercept) 383743.7662 133560.1245  2.873191 4.594819e-03
2         day   -350.7241    252.3450 -1.389859 1.664326e-01
3    Bathroom 117095.1690  46550.3764  2.515451 1.283828e-02
4         Car -95970.5951  28180.6997 -3.405543 8.281795e-04
5    Landsize   1405.0937    139.7697 10.052919 7.217317e-19
  r.squared adj.r.squared    sigma statistic      p.value df    logLik
1 0.4064021     0.3920986 293446.7  28.41265 5.502832e-18  5 -2392.897
       AIC      BIC     deviance df.residual
1 4797.795 4816.645 1.429442e+13         166
               term     estimate   std.error statistic      p.value
1       (Intercept) -609336.1091 591487.3417 -1.030176 3.044443e-01
2               day    -335.0772    253.6054 -1.321255 1.882569e-01
3          Bedroom2  301751.1724 178210.5853  1.693228 9.231066e-02
4          Bathroom  709500.7906 347548.8727  2.041442 4.280941e-02
5               Car  -91831.0578  28315.5849 -3.243128 1.432490e-03
6          Landsize    1435.9937    142.0554 10.108685 5.619832e-19
7 Bedroom2:Bathroom -181890.5261 104950.4689 -1.733108 8.495607e-02
  r.squared adj.r.squared    sigma statistic      p.value df    logLik
1 0.4171212     0.3957964 292552.9  19.56035 3.757403e-17  7 -2391.339
       AIC      BIC    deviance df.residual
1 4798.679 4823.812 1.40363e+13         164

Grading

Points for the assignment will be based on: